Byte
-
Is the smallest addressable unit of memory on a system.
Size
-
A
byteis not always the same as au8(unsigned 8-bit integer), although they are often treated that way in modern systems.-
On almost all modern hardware:
-
1 byte = 8 bits
-
-
So a byte happens to match a u8
-
That’s why in practice people often treat them as equivalent.
-
-
Its size is defined by the architecture, not the language.
-
6-bit byte systems :
-
IBM 1401
-
CDC 6600
-
Reason:
-
Character sets (like early encodings) fit in 6 bits (64 symbols)
-
Optimized for text and business data
-
-
-
9-bit byte systems :
-
DEC PDP-10
-
Reason:
-
Used 36-bit words, often split into '4 Ă— 9-bit bytes'.
-
8 bits for data + 1 parity/error bit.
-
-
-
Virtually all general-purpose CPUs use 8-bit bytes.
-
This is standardized in practice because:
-
ASCII → extended to 8-bit
-
Hardware, networking, storage all aligned around 8 bits
-
C/C++ standard
-
sizeof(char) == 1→ this is a byte -
But a byte is only guaranteed to be at least 8 bits, not exactly 8.
-
Theoretically:
-
1 byte could be 16 bits
-
Then:
-
char= 1 byte = 16 bits -
u8= 8 bits → not the same thing
-
-
-
uint8_texists only if the platform actually supports an 8-bit type -
char≠guaranteed 8 bits
Rust
-
u8is always 8 bits -
u8is effectively the language’s “byte” -
Rust assumes 8-bit bytes for supported platforms
word, dword, qword
-
A word is the natural data size of a CPU—the size it processes most efficiently.
-
Historically: tied directly to register width.
-
Today: still loosely tied to architecture, but terminology is often legacy-driven.
-
The CPU/ISA defines them. They are not defined by the OS.
-
word= 16 bits (original 16-bit CPUs like the 8086) -
dword= “double word” = 32 bits -
qword= “quad word” = 64 bits -
Example x86 assembly:
mov eax, dword ptr [rbx]
Name confusion
-
These meanings are not universal, just widely adopted due to x86.
-
Architectural definition
-
A word = register size
-
16-bit CPU → 16-bit word
-
32-bit CPU → 32-bit word
-
64-bit CPU → 64-bit word
-
-
x86 legacy usage :
-
word= 16 bits (even on 64-bit CPUs) -
dword= 32 bits -
qword= 64 bits
-
-
So on modern x86, a “word” is not the natural CPU size anymore
-
Low-level languages usually avoid ambiguity:
-
C/C++:
-
Avoid “word” entirely
-
Use
uint32_t,uint64_t, etc.
-
-
Odin / Rust:
-
Explicit sizes
(u8,u16,u32,u64) -
No reliance on “word” terminology
-
-
Register Efficiency
-
"using a byte (u8) for a 64bit system is not as efficient as using a u64?"
-
Using u8 is not inherently inefficient on a 64-bit system, but there are cases where u64 is faster.
-
A 64-bit CPU (e.g., x86-64, ARM64) is optimized for 64-bit registers, so:
-
Operations on u64:
-
Usually map directly to single instructions
-
Fully utilize registers
-
-
Operations on u8:
-
Often get promoted to 32 or 64 bits internally
-
May involve extra masking or extension instructions
-
-
-
So for pure arithmetic:
-
u64 → often more efficient
-
u8 → sometimes slightly less efficient
-
-
This is where u8 can actually be more efficient :
-
u8 uses 8Ă— less memory than u64
-
Smaller data:
-
Better cache utilization
-
Fewer cache misses
-
Higher bandwidth efficiency
-
-
Example:
-
Processing a large array:
-
u8[] → more data fits in cache → often faster overall
-
u64[] → fewer elements per cache line
-
-
Modern CPUs use SIMD heavily:
-
With u8:
-
You can process 16–64 elements at once (e.g., AVX2/AVX-512)
-
-
With u64:
-
Only 2–8 elements at once
-
-
-
CPU Architectures
| Term | Architecture | Bits | Notes |
| ----- | -------------- | ------ | ---------------------- |
| x86 | Intel (legacy) | 32-bit | Original PC standard |
| x64 | x86-64 | 64-bit | Extension of x86 |
| amd64 | x86-64 | 64-bit | Same as x64 (AMD name) |
| arm64 | ARM (AArch64) | 64-bit | Different ISA entirely |
x86 (32-bit)
-
Standardized as 32-bit
-
Key properties:
-
32-bit registers (EAX, EBX, etc.)
-
32-bit address space (~4 GB limit)
-
Complex instruction set (CISC)
-
-
When someone says:
-
“x86 build” → usually means 32-bit binary
-
-
Why is it called x86 ?
-
Refers to the classic Intel architecture starting from:
-
8086 → 80286 → 80386 → 80486 → Pentium...
-
-
Instead of listing all of them, people started referring to the whole family as:
-
“x86” = any processor in the *86 family
-
The “x” is just a wildcard.
-
x64 / amd64 (64-bit x86)
-
These are the same thing.
-
AMD created the 64-bit extension to x86
-
Called it amd64
-
Intel adopted it (called it Intel 64)
-
So:
-
x64 = amd64 = x86-64
-
Key properties:
-
64-bit registers (RAX, RBX, etc.)
-
Much larger address space
-
Backward compatible with 32-bit x86
-
ARM64 (AArch64)
-
Completely different architecture from x86.
-
Designed by ARM Holdings
-
Used in:
-
Phones
-
Tablets
-
Apple Silicon (M1/M2/M3)
-
Many servers now
-
-
Key properties:
-
64-bit only (in ARM64 mode)
-
Simpler instruction set (RISC)
-
Different registers and instructions from x86
-
What the CPU provides vs Typing
-
u8,u16,u32,u64 -
i8,i16,i32,i64 -
int,uint -
bool -
pointers
-
struct
-
etc
-
They are ways for a programming language to describe how many bits to use and how to interpret them.
-
The CPU itself only sees bits and instructions.
-
The hardware supports multiple widths—but it does not define “types”
-
A CPU (ISA) defines:
-
Register sizes (e.g., 64-bit registers on x64)
-
Instruction widths (8, 16, 32, 64-bit operations)
-
Operations: add, sub, mul, load/store, etc.
-
-
Example (x86-64 idea):
-
add rax, rbx→ 64-bit add -
add eax, ebx→ 32-bit add -
add al, bl→ 8-bit add
-
-
The CPU doesn’t inherently “know” signed vs unsigned
-
Difference comes from which instructions you use:
-
Unsigned division → div
-
Signed division → idiv
-
-
Signed comparisons vs unsigned comparisons → different opcodes
-
Signedness is a semantic layer imposed by the compiler, not a stored property.
-
-
For pointers, the CPU just sees a number:
-
32-bit → pointer = 32-bit integer
-
64-bit → pointer = 64-bit integer
-
There's no special “pointer hardware type”.
-
-
A struct is a group of fields with layout rules; a memory pattern. The CPU just sees contiguous memory.